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Abstract 

This paper considers a problem that relates to the theories of cov- 
ering arrays [5], permutation patterns [8], Vapnik-Cervonenkis (VC) 
classes [2], [5], and probability thresholds [lj. Specifically, we want to 
find the number of subsets of [n] := {1, 2, ... ,n} we need to randomly 
select, in a certain probability space, so as to respectively "shatter" all 
t-subsets of [n]. Moving from subsets to words, we ask for the number 
of ra-letter words on a ^-letter alphabet that are needed to shatter all 
t-subwords of the q n words of length n. Finally, we explore the num- 
ber of permutations of [n] needed to shatter (specializing to t = 3), 
all length 3 permutation patterns in specified positions. We uncover 
a very sharp zero-one probability threshold for the emergence of such 
shattering; Talagrand's isoperimetric inequality in product spaces [I] 
is used as a key tool. 
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1 Introduction 



In this section, we give the necessary background on covering arrays; Vapnik- 
Cervonenkis classes; and permutation patterns, and then explain our goals. 
A k x n array with entries from the alphabet {0, 1, . . . , q — 1} is said to 
be a (t, q, n, k, A)-covering array, or briefly a t-covering array, if for each of 
the ("J choices of t columns, each of the g* g-ary words of length t can be 
found at least A times among the rows of the selected columns. Covering 
arrays are used as valuable tools in software testing; see, e.g., [5], which is 
a comprehensive survey of the theory of t-covering arrays. In this paper, we 
will focus solely on the case A = 1. If q = 2, we can interpret any row as the 
characteristic vector of a subset of [n] - by making a correspondence between 
the positions where the row has ones, and the set of those positions. We thus 
have the following alternative formulation of covering arrays: A family F of 
subsets of [n] is a t-covering array if for each {ai, . . . , a t } C [n], 

\{{ai, . . . , ch} n F} :FeF\ = 2 t . 

We next see how this definition relates to that of VC classes. 

A class F of subsets of an abstract set y is said to shatter a subset 

A = {a u ...,at} cya 

\{Af]F} : F G F\ = 2*. 

Furthermore, the VC dimension VC(F) of F [6] is the cardinality of the 
smallest unshattered set (the dimension is oo if all sets of all finite size are 
shattered.) A class F is said to be a VC class if VC(F) < oo. Many canonical 
examples of VC classes are driven by underlying geometric considerations. 
For example, consider the infinite family F of subsets of y = K of the form 
(— oo,x] :i6K. Every set of size 1 is clearly shattered by F. Next letting 
t = 2, we consider the class of all 2-element subsets of R. It is then clear 
that 

|{{ai, a 2 ; a x < a 2 } H F} : F E F\ = 3 < 2 2 , 

since it is impossible for F G F to intersect the two element set {a 1? a 2 } in 
its larger element a 2 . It follows that VC(F) = 2. To give another example, 
if F consists of all convex sets in y = R 2 , then it is impossible for elements 
of F to "shatter" a three element subset A = {01,02,03} of collinear points 
since 

\{A(~) F : F E F}\ = 7<2 3 . 
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Since every 2-element set can be shattered by convex sets, we have that 
VC(F) = 3 in this case. 

VC classes were first defined and used in the context of uniform limit 
theorems in Statistics [6], [13]; later, their use was extended to learning 
theory (2], [12]. The alternative (and perhaps more popular) definition of 
the VC dimension of F is "the cardinality of the largest shattered set" . In 
many cases, e.g., in the first example given above, the largest shattered set 
and the smallest unshattered set differ in size by 1; in general, however, this 
is not the case, as in the second example. We will use the first definition 
of the VC dimension for reasons that will become clear in what follows, but 
which all stem from the fact that we are operating in a finite setting. 

The above discussion reveals that with y = [n], an ensemble F of finite 
subsets of [n] is a binary t-covering array if and only if VC(F) > t + 1; an 
explanation follows: If F is t-covering, then for each set A of size t and each 
B C A, there exists F £ F such that F D A = B; thus every set of size t 
is shattered, and the smallest unshattered set must be of size t + 1 or more. 
The reverse argument is valid too. 

When q > 3, covering arrays are often described in terms of words. We 
will use this terminology in this paper too, but the notions of shattering and 
VC dimension are probably best described using the language of multisets. 
We will interpret akxn array {ay}i<i<fc ; i<7< n , with entries from {0,1, ... ,q— 
1} as consisting of k multisets, with the zth multiset containing the element 
j dij times, where the degree of the multiset, i.e., the maximum number of 
times an element may appear in it, is bounded by q — 1. The notion of the 
intersection of two multisets A, B is defined in the natural way; for example 
{1, 1, 2, 2, 3} n {1, 1, 1, 1, 2, 3, 3} = {1, 1, 2, 3}. We say that a collection F of 
k multisets shatters a multiset A with t distinct elements each repeated q—1 
times, if 

\{Af]F : F £ F}\ = q\ 

As before the VC dimension VC(F) of F is the cardinality of the smallest 
unshattered multiset of the above type, with q fixed and the minimum taken 
over t iVC(F) = oo if there is no such smallest t). We thus see that J 7 is a 
(t, q, n, k, l)-covering array if and only if VC(F) > t + 1. 

We next turn to permutations. The theory of permutation patterns was 
initiated by Knuth [8], and continues to be an area of active investigation. 
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We say that a permutation it e S n contains the permutation p & St if 
there exist indices 1 < i% < z 2 < . . . < it < n such that (7^, . . . , 7Tj t ) and 
(pi, . . . , pt) are order isomorphic; if not we say that tt avoids p. Enumeration 
questions are critical in this area; for example it is known that for t — 3, the 
number of (i,j, k) avoiding n-permutations is given by the Catalan numbers 
( 2 ™)/(n+l) for each of the six choices of k [3J. Moreover, the Stanley- Wilf 
conjecture, namely that for fixed t, the number of p-avoiding n-permutations 
is asymptotic to C n for some 1 < C = C p < 00, was recently proved by 
Marcus and Tardos [10] . How might shattering and VC dimension be defined 
in the case that T consists of an array of k n-permutations (711, . . . , 7Tfc)? 
Using the language of covering arrays, we shall say that the VC dimension 
is at least t + 1 if for each choice of t columns and p G S t , at least one row of 
the selected columns contains entries order isomorphic to those of p. This is 
equivalent to saying that the k permutations, restricted to any t positions, 
shatter all the t\ permutations on those positions. 

Research on t-covering arrays has focused on finding arrays of small size. 
In [TT], for example, the case of t = 3 is studied in detail, and Roux's 
result that there exist 3-covering binary arrays of size 7.51gn is proved, where 
lg = log 2 . This result was re-proved in [7] using the Lovasz Local Lemma 
(see p]), where the underlying probability model consisted, as in the work 
of Roux, of independently placing an equal number of ones and zeros in each 
column. This model is intractable for general values of t and q; accordingly, 
the general upper bound on the size of covering arrays was proved in [7] by 
reverting to a simple multinomial model, where each spot in the k x n array 
is independently and uniformly chosen from the set {0, 1, . . . , q — 1}. But 
the Lovasz Lemma is an existence result whose conclusion is that there is a 
positive probability that there are no "bad events," i.e., that 

k > K =>• P(array is t — covering) > 0, 

so that a t-covering array with K rows exists. By contrast, in this paper we 
are looking for results, still in the log n domain, that are of the form 

k < k (n) P(array is t — covering) — > {n — > 00); 

k > ki(n) =^> P(array is t — covering) — > 1 (n — > 00), 

and where the gap [ko(n), k±(n)] is not too wide. We will use the simple 
first moment method (linearity of expectation) together with Talagrand's 
isoperimetric inequalities, to establish such a result in Section 2. 



4 



The situation is a little more nuanced when we turn to the question 
of shattering permutations. First of all, we are only able to prove clean 
results when t — 3, but, more importantly, it is also meaningful to consider 
large arrays with small VC dimension. For example, if we wrote each of 
the ~ 4" 123-avoiding n permutations in a stack, there would be no 3-ple 
order isomorphic to 123 in any set of 3 columns, and the relevant question 
would be to investigate how much better than that we could do. This is 
the approach taken by Cibulka and Kyncl [3], who, use the second definition 
of VC dimension to give superexponential bounds on the size of the stack 
for t = 3. Our motivation is to use the first definition of VC definition to 
produce logarithmic stacks with given VC dimension - with probability that 
transitions from to 1 in a short interval. Results along these lines are proved 
in Section 3. 



2 Shattering Subsets and Words 

We use the following model. Let J 7 be a randomly generated stack of k 
words, each of length n and obtained by selecting each position in the k x n 
array to independently and uniformly be one of the letters of the "alphabet" 
{0, 1, . . . , q — 1}. Denote the words in T as F 1 , . . . , F k . As noted in Section 
1, if q = 2, then J 7 is simply a random set system of k subsets of [n]. We will 
use rows to refer to the words in J 7 and columns to refer to the character 
positions. In this section, we show that the threshold, under our model, for 
the property "J 7 shatters all t-words" (which is an alternative term we use 
for multiset shattering) occurs at the level , l t \ lg (n); this will allow us to 



determine with high probability the VC dimension of a random word array. 
Deriving an upper threshold is easy: 

Theorem 2.1. Ifk > / lg " T (l+o(l)), then allt-words are shattered almost 



surely by J- ', i.e., the probability that T is a covering array tends to 1 as 
n — > oo. 

Proof. Let X be the number of sets of t columns corresponding to unshat- 
tered t- words. By Markov's inequality, ¥(X > a) < E(X)/a, valid for non- 
negative random variables X, we have: 
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P (^ 1)£E ,a-) £ (';) 9 .(^)%^(^)^o 

provided that (with u(n) denoting a function growing to infinity arbitrarily 
slowly) 

tlgn + u(n) -\gt\ +t\gq flgn 
fc > ^— -r = — 7 N (1 + o(l)) := fci(n), 



as asserted. □ 



Proving that the lower threshold function fco(n) is of the same magnitude 
is tantamount to showing that the random variable is sharply concentrated 
around its mean. In some sense this was done in [7], but using a naive (and 
ultimately incorrect) probability model that was only shown to be valid for 
t = 3. To give a more rigorous proof in this paper, we shall apply Talagrand's 
inequality in the form found in pQ (other proofs are possible). This inequality 
is applicable for random variables that are 1-Lipschitz: 

Definition 2.2. Let Z be a random variable expressed as a function of N 
independent indicator variables We call Z 1-Lipschitz if 

|Z(J 1 ,...,J*)-Z(I 1 V..,IJr)| < 1 

whenever Ii ^ I* for at most one i. 

The random variable X, counting the number of sets of "defective" t 
columns (i.e. those sets corresponding to unshattered t- words), depends on 
nk mutually independent random variables. It is not, however 1-Lipschitz 
since an added presence or absence of a specific character may change X by 
more than 1 due to overlapping columns. However, if we define Y as the 
maximum number of non- overlapping sets of "defective" columns, then Y is 
1-Lipschitz. 

Talagrand's inequality also involves the notion of a certification function: 

Definition 2.3. Let Z be a random variable expressed as a function of N 
indicator variables U, and let f : N — >• N be a function. We call Z f -certifiable 
if Z > s can always be verified to be true by f(s) of the N indicator variables. 
In this case, we call f a certification function for Z . 
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Given the random variable Y as above, to verify that there are at least s 
non-overlapping sets of unshattered fc-words, it is easy to see that it suffices 
to know kts of the entries in the array. Thus / is linear and f(s) = kts. 

Talagrand's inequality is reproduced below for completeness: 

Theorem 2.4 (Talagrand's Inequality). Let Z be a 1-Lipschitz random 
variable with certification function f . Then, for all m,u > 0: 

F(Z <m- uy/f (m))P (Z > m) < e~ 

Applying Talagrand's inequality to the variable Y with m = Med(y) (so 
that P(y > m) > 1/2) and u = Jf v we see that 

P(F = 0)<2e~^t. (1) 

We will use (1) in an appropriate way to get the lower threshold; specifically, 
we need to derive conditions under which (i) K(Y) and E(X) are close; and 
(ii) E(y) and m are close. A series of technical lemmas that lead to (i) and 
(ii) are presented next. 

Lemma 2.5. Let T and A be distinct non-disjoint sets oft columns. Let r 
be the number of overlapping elements ofT and A; i.e., r = \T fl A|. Define 
the indicator random variable Ir as being lifTis missing at least one t-word 
and otherwise. Then 



^{IvlA = l)<q 2t ' r { ^ t + (f qt t 2 ) {l + o(l)} (fc-+oo). 

Proof. Lemma 2.5 generalizes a result in [7J and this proof is similar. Let 
A m be the event that exactly m words are missing from V. We have 
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P(/r/A 



= P(/r 
= P(/r 

= P(/ r 
<P(/r 

<P(/r 
= P(/r 



l)P(7 A = l|7 r = l) 

1)P(/ A = l\A 1 UA 2 \J---\jA q t_ 1 ) 

"P (7 A = l n A x ) + ■ ■ ■ + P (7 A = 1 n Agt^) 



1) 
1) 

1) 
1) 



F(A 1 UA 2 U...UA qt _ 1 )) 



P(7 A = 1|A 1 ) + 
P(7 A = 1|A 1 ) + 



p(i 2 ) + - + p(y 1 ) 

P (A ± U . . . U Agt_l) 



P(7 A = 1|A) + 

= P(7 r = l)- 

P(/ A = 1|A 1 ) + 



{l + o(l)} 



(2) 



Since P(7 r = 1) < q f (1 — q~ t ) h , the problem reduces to upper-bounding 

P(7 A = Exactly one word is missing in T; let us denote that word 

by 7. Assume, without loss of generality, that the first r columns of A are 

the same as the last r columns of T. We consider two cases. Let pi be the 

conditional probability that a word beginning with the last r characters of 7 

is also missing in A; there are q l ~ r such words. Let p 2 be the probability that 

a word not beginning with these same r characters is missing in A; there are 
qt _ qt 



"' r such words. Hence: 



P (7 A = < q t - r p 1 + ft - q l - r ) p 2 

We first calculate p\. We know that 7 is the only word missing in T, so 
for each of the remaining q l — 1 words in this first category, there exists a 
row in V containing that word. Take away one such row for each of these 
q f — 1 words; each of the other rows are randomly assigned to one of these 
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g* — 1 words with probability each. This process enables one to realize 
the probability distribution of the content of the rows of T given that A\ has 
occurred. Let A be the number of rows in A that coincide with those of Y in 
the overlapping r positions; note that A is at least q l ~ r — 1. Then for a > 0, 



k - (q* - 1)\ fq l ~ r - 1 V /Y - ^\ fe -(« t - 1 )- a 



P(.A = a + q l - r - 1) = 
Using the binomial theorem we obtain 



q t - 1 J V 9* - 1 



a=0 

a+g*~ T '-l 

1 



a / \ 9 1 / \ 9 1 

1 



(g*" r - l) z g* - g'~ r \ 1 J /g'~ r - 1 ^ q ~ x 



+ 



-.t-r 



q l r (g* — 1) g* — 1 I V 9 

l-g^^ fc ^ 1 _ q r-t\ / ! >"''-] 



< 1 



g* — 1 / \ g* — 1 / \ g* r 

1 -g r - 



g* - 1 

where the last inequality is valid since 



1-J-W1 1 



9 < ~V ~ V 9* _ 1 9*~ r (9* - 1) 

which follows from the fact that the function (1 — kx) l l x is monotone de- 
creasing on the interval [0, 1] for fixed k G (0, 1). 

Repeating the process for p 2 , let B be the number of rows in A that do 
not begin with the last r characters of 7 in some fixed fashion; B is at least 
q l ' r . Then, 
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P(B = 6 + 9*-) ' ^ (g< " lA ^ ^ - 1 - ^ ' ; 



b J Vtf* - 1/ V 9* - 1 
and by the same reasoning as before, 



P2= J2 



k — (q f — 1)\ / q l r \ ( q l — 1 — q 



b=0 

b+q t - r 

1 



1 



- 1 1 

Therefore, 



g* - 1 



P(/A = l|A 1 )<g*-> 1 + (g*-g*- r )p 2 

( 1 - i 7T?)* + (9 t -9 t - r ) (* 



t 

9 -mi-^Y(W-i/ 9 '" 



5* — 1 / \ ' \<? r_< + 9 

q* - 1 



1 - LX_ ] {1 + 0(1)} (fc->oo), 
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and thus by (2) and (3) 



P(J r / A = 1) <P(/r= 1) 
< <t (1 - ?"*) 



P(/ A = l|Ai) + 



q t _ !^ _ 2 
g* - 1 



J2t-r 



1 - g r 
g* - 1 



t\ k / „t 



g* - 1 



g t + g r-t_ 2 y | q t (q t -l) fq t -2 
Q 1 



{l + o(l)} 

{l + o(l)} 
{l + o(l)} 



„2t-r 



g* + g r ~* - 2 



1 + 



g* - 1 / g* - 2 



{! + «(!)} 



J2t-r 



2q t ~ r \q t + q r ~ t -2 

k 

{l + o(l)} (fc->oo). 



g 1 



This proves Lemma 2.5, our main correlation bound. 



□ 



Continuing the quest for a lower threshold, we compare the means of 
X, the variable of interest, and Y, the maximum number of disjoint collec- 
tions of unshattered t-words. Denoting the number of overlapping pairs of 
unshattered t- words by Z, we have that 

Y < X < Y + Z, 

so that 

E(X) < E(Y) + E(Z). 
Now, Fact 10.1 in [H] can be adapted to our case as follows: 

Lemma 2.6. Let m denote the median ofY. Then 

\E (Y) -m\< A0y/ktE(Y), 
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and so, by (1), 

P(X = 0) = P(Y = 0) < 2e-sr 

< 2e"3fe {E(Y)_40 v /fcffi(y)} 

< 2e~3H {E(x)_E(z) ~ 40 v /fe ' E ( x ^. (4) 

The key issue is thus to find conditions under which E(Z) — >■ 0. By 
Lemma 2.5, 

E(Z) = J2 V(ItIa = 1) 



t + g r-t _ 2 

n I 

r=l 

for some constant K = K tq . The rth term in (5) tends to zero provided that 

(2t -r)]gra + w(n) 
« > 7 : \ — > ( 6 ) 



6 \q t +q r - t -2 J 



with — > oo being arbitrary. The next two lemmas enable us to determine 
when (6) holds for all r. 

Lemma 2.7. The function f : [2, t - 1] ->■ R de/med fry /(r) = ^^-f 1 ' 9 ^ 2 
zs monotonically increasing. 

Proof. Define #(a) = for 1 < a < * - 2. We have 

ag a log(g) - (g a - 1) 
9 («) = ^ 

The result follows since ag a log q—(q a — I) = q a (a log g — 1)+1 > 2 (log 2 — 1) + 
1 = 2 log 2 - 1 > 0. □ 

Lemma 2.8. The constant —? — — y indicated by (6) is largest when r = 
1. 
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Proof. We prove that , J — r- < , q l — r- for integers q > 2, t > 3, 
and r G [2, t — 1]. This occurs if and only if 



t \ 2t-r / * \ 2t-l 

q \ I q 



or 



qt + q i-t_ 2 J ~ y q t + q r-t_ 2 



Since 1 + x < e x , it suffices to show that: 

(2£ - 1) g 1 -* (g^ 1 - 1 
qt + g i-t _ 2 

or 

. /V -1 - 1\ . -i 4 1 4 

(2t-l r <2g*- 1 -l + --^ rT --. 

\ r — 1 / q l q zt 1 q 

Because 4 > and 1 + -ot=t + -< 1 + ^ + 2 < 4, it then suffices to show 

qt q 2 - 1 1 q — 32 — ' 

that 

or, by Lemma 2.7, that 

V 2 U< -V ■ ■ ■!). (7) 



t-2 



Now (7) may be verified to be true for t > 4; g > 2 and for £ = 3, g > 3. The 
remaining case, t = 3; q = 2 can be checked by verifying the statement of 
Lemma 2.8 directly. This completes the proof. □ 

By (5) and Lemma 2.8, 

J, i „i-t o \ k 



E(Z) < K t , q n f 1 



^ if k> (2 / ^ lgn x (l +„(!)): (8) 



the next lemma verifies the rather critical fact that this occurs for fc's that 
are smaller than the lower threshold we hope to exhibit. 
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Lemma 2.9. 

(2t - 1) t 



Proof. The claim is equivalent to 



t \ 2t-l / t 

< ' 



or to 



q*-lj " + - 2/ ' 



g 2 ' / g* — 1 \ q l 



< 



q 2t _ 2q t + iy q t J q t + q i-t _ 2 

Using the inequalities 1 — x < e~ x and e~ x < 1/(1 + x), we see that 

q l — t tq l 



q l 



) < exp{-l/(^)} < 



so that it would suffice to show, on simplification, that 

q t {t(q-l)+2}<q 2t + l, 
which is true since q t > t(q — 1) + 2; q, t > 2. □ 

We are now ready to state the main result of this section. 

Theorem 2.10. Consider a k x n array with entries that are uniformly 
and independently selected from {0, 1, . . . , q — 1}. Then if for large enough 

A = A,q 

k < ^ w(n) > Alglgn, 

then the probability that the array is a (t,q,n, k,l)-covering array tends to 
zero as n — > oo. 

Proof. Equation (4) reveals that the array will be t-covering with low prob- 
ability whenever E(Z) — > and M(X)/kt — > oo. Since 
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we have that E(X)/kt — > oo provided that 



k < 



tlgn — u(n) 




with u(n) > Alglgn. Thus (8) and Lemma 2.9 reveal that F(X = 0) ->■ if 



The full conclusion of the theorem follows by monotonicity, in k, of F(X = 0). 
□ 

To seal the connection between covering arrays and shattering multisets, 
we restate Theorem 2.10 as follows: 

Theorem 2.11. Consider k multisets A = {A±, . . . , A^} of [n] as follows: 

(i) Each element of [n] is represented in A-i at most q — 1 times; q > 2, 
1 < i < k, and 

(ii) The ensemble A is randomly generated by choosing the multiplicity 
of each element j in multiset A^ (1 < j < n; 1 < i < k} independently and 
uniformly from the set {0, 1, . . . , q — 1}. 

Then the collection A fails to shatter all multisets of t elements, each 
element repeated q — 1 times, with high probability if k < (tlgn — w(n))/lg^— p 

oo(n) > Alglgn. 

Together with Theorem 2.1, Theorems 2.10 and 2.11 show that the gap 
between the lower and upper thresholds is rather small; actually this gap 
arises as an artifact of the Talagrand inequality - and may in fact be artificial. 
Finally, observe that we have actually uncovered a threshold for the VC 
dimension of random multiset arrays. For example, if q — 2 then sets of size 3 
are fully shattered with high probability (w.h.p.) at the level 31gn/lg(8/7) ~ 
15.571gn. Thus the VC dimension is 4 or more. But few sets of size 4 are 
shattered at this level; they are all shattered w.h.p. when the number of 
rows are of magnitude 41gn/lg(16/15) ~ 42.961gn. In between these levels, 
the VC dimension of the set system is thus equal to 4 w.h.p. 



(2t - l)lgn 




tlgn — u(n) 



, u)(n) > Alglgn. 
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3 Shattering Permutations 



For permutations, we use a model analogous to the one used for words in 
the previous section. Let S be a randomly generated set of k permutations 
7Ti,...,7Tfc G S n , with each chosen independently with probability l/n\ As 
before, we can represent S as an array, and we will continue to use rows to 
refer to the elements of S and columns to refer to the positions within each 
element of S. 

Shattering permutations is conceptually similar to shattering words. Let 
ii, 12,^3 be any 3 elements of [n] and let S* be the set consisting of the k 
triples formed by intersecting the iith, ^th, and 23th columns with the k 
rows of S. Then, S shatters the triple (11,12, 23) (or the positions (11,12,13)) 
if p G S* up to order-isomorphism for each p G S3. 

In this section, we show that the threshold function for the property 
"shatters all 3-triples" under our model is k (n) = j-jerlg (n) modulo a small 

gap. We use the same approach as before, again using Markov's Inequality 
for the upper threshold and Talagrand's Inequality for the lower threshold. 
(The analysis becomes intractable for higher values of t, which is the size of 
the tuple we wish to shatter, and we thus restrict to t = 3 in this paper.) 

Theorem 3.1. If k > — ^-\g(n)(l + o(l)), then all triples are shattered 
almost surely by S. 

Proof. Let X be the number of unshattered triples. By Markov's Inequality, 
we have: 



Now, for the lower threshold. Define Y as the maximum number of non- 
overlapping sets of unshattered triples of positions. Y is clearly 1-Lipschitz 
with certification function / (s) = 3ks, and as before 



provided that k > 




k 



->■ 



□ 



Y < X < Y + Z, 
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with Z defined as the number of pairs of overlapping unshattered triples of 
positions. The correlation between overlapping triples in the same row is 
crucial in understanding the quantity 

E(Z) = P ( J r J A = 1) = E P ( /r/ A = 1 )+I] P ( J r J A = 1), 

rnA^0 |rnA|=i |rnA|=2 

where T and A range over the set of distinct non-disjoint sets of 3 columns, 
and the indicator random variable 7r is defined as being 1 if T is missing 
at least one 3-permutation and otherwise. Unlike the case of words, the 
situation here is more complex. First, given an overlap size we can no longer 
consider just two cases, since, e.g., two patterns ijk and abc may correspond 
to overlapping last and first columns in T, A respectively, but k may equal 3 
and a might be 1. Second, we can no longer assume without loss of generality, 
as we did with words, that the overlap occurred in the last r columns of T, 
and that the rest of A was entirely to the right of that overlap. In other 
words, correlations depend not just on the magnitude of the overlap, but 
its nature as well. With this in mind, let A m be the event that exactly m 
3-permutations are missing from r, 1 < m < 5, let B a b c C A\ be the event 
that the 3-permutation abc is the only permutation missing from T, and let 
Cijk be the event that ijk is missing from A. Then, we have: 



P(/ r / A = l) 



< 



P(/ r = 
P(/r = 

P(/ r = 

P(/r = 

= P(/ r = 

= P(/r = 
[P('a 

= P(/r = 



1)P(/a 
1)P(/a 

1) 
1) 



l|/ r = l) 
l\A x U A 2 U 

= inA 1 ) + - 



■UA 5 ) 
+ P(/a 



in As) 



P(/a 
P(/a 



P(A 1 UA 2 U...UA 5 )) 



+ P(A 5 



1U0 + 



P(Ai)+P(A 2 ) + ...P(As) 

15 • (4/6) fc 
6 • (5/6) fe - 15 • (4/6) fc 



1) 



< 



< 6 



P(/ r 

'5 



1) 
1) 



1 1 Bias U B 132 U B 213 U B 231 U B 312 U B 32l ) + 0((4/5) fc )] 

p (/ A = l n B 123 ) + . . . + P (i A = l n B 321 ) 



¥(B 123 )+F(B 132 ) + --- + P (B 321 ) 



+ 0((4/5) fe ) 



/a = 1|S 123 ) + . . . + P (J A = l|£ 32 i)) + 0((4/5) fe )) 



' (Ciasl^m) + . . . + P (C^il^i)) + 0((2/3) fc ). 



(9) 
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Accurate estimation of the quantities P {Cijk\B abc ) will thus be critical. 

Let T and A be distinct non-disjoint sets of 3 columns. Let D abc and 
be the events that abc appears in a fixed row of T and ijk appears in the 
same row of A. Then F(D abc nF ljk ) = A/120 or F(D abc nF ijk ) = B/24 in the 
"one-overlap" and "two-overlap" cases respectively, where A and B are the 
number of ways the two patterns can co-exist among the five or four numbers 
in the two sets of columns. These numbers can, without loss of generality 
can be taken to be 1, 2, 3, 4, and 5; or 1, 2, 3, and 4 in the one-overlap 
and two-overlap cases respectively. Now the probability distribution of the 
components of V conditional on B abc can be obtained, as in Section 2, by 
randomly selecting five rows; placing one pattern other than abc in these 
rows; and randomly choosing a pattern other than abc to appear in the other 
rows. For simplicity, however, we will assume that each of the k rows in T is 
equally likely to be chosen to be one of the non-a&c patterns; it can be shown 
that using this slightly incorrect^ conditional distribution leads to no change 
in our final conclusion. We have 



Lemma 3.2. Assume that \T D A| = 1 and let 7, 5 refer to the index of the 
overlapping position in the two sets of columns. Then we have the probabili- 
ties in Table 1. 

Proof. We will exhibit just two of the calculations using two different proofs; 
the other calculations may be performed similarly using either of these meth- 
ods. It is easier to calculate the probabilities F(D UVW D Fijk\B abc ) instead of 
W{D umo fl F^ k \B abc ). First suppose that uvw = 321, ijk = 132 and (without 
loss of generality) abc = 123. Let the five positions spanned by (r, A) be as 
follows (this is the 7 = 2; 5 = 1 case): 



since this yields a non-zero probability of there being four or fewer patterns in T. 




(10) 



r 3 2 1 

A 1 



3 2 
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(7,5) 


W{D uvw C\FV\B ahc ) 


(1,1) 
(1,2) 
(1,3) 
(2,2) 
(2,3) 
(3,3) 


vr^ 

\ 7° 

100 
19 

100 
16 

\ 7° 

\ 4° 
100 



Table 1: F(D UVW D F§ k \B abc ) for Overlap 1 

The first thing to observe is that the relative positions of the non-overlapping 
indices amongst the five positions are irrelevant. Denoting the numbers in 
the five positions, from smallest to largest, by 1, 2, 3, 4, and 5, we see that 
the arrangement above is fulfilled by the permutations 32154, 42153, and 
52143; thus P(£> 321 nF 132 \B 123 ) = § . = JL whence F(D 321 nFg 2 \B 123 ) = 
| — y§q = j^q- This proves the validity of the second entry in Table 1. Let 
us next verify the fifth entry using another method. Assuming that the 
alignment of T, A is as below 

T 1 3 2 
A 2 3 1 

we choose the common element, x, and must then choose one element larger 
than, and one smaller than x in the T columns, and two elements smaller 
than x for the columns in A. This yields, e.g., 




1 

40' 



so that P(Di32 fl F 231 \B\ 23 ) = j^q, as asserted; the approximations in the 
above equation array actually give an exact answer due to combinatorial 
considerations. □ 

Returning to (9), we first address the contribution of the 0(2/3) fc term. 
Summing this quantity over all choices of T, A, we see that the net contribu- 
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tion of terms corresponding to two or more permutations being absent in T 
is negligible provided that 




) 



k 



->■ 



for some constant A. This occurs if k > (5 + o(l))lgn/lg(1.5), or if k > 
8.551gn. 

We now turn to the 36 terms in the first part of the last line in (9), each 
term of which may be calculated using (10). Rather than work each term 
separately, we find the worst (largest) term and use it as a bound for the 
others. However, doing one calculation in detail will be instructive and seen 
to be quite general. Assume that abc = 123 and ijk = 231. The indices uvw 
now vary among 132,213,231,312, and 321. Since |T n A| = 1, we see that 
no matter how the overlap occurs in the five columns that determine T U A, 
the indices 7<5 consist of two dgs, two egs and one fg - where g is the same 
no matter how we vary uvw, and {d, e, /} = {1,2,3}. Table 1 now reveals 
that with g — 1, the three relevant numbers are 14/100, 17/100, and 19/100; 
and that these triads are 17/100, 16/100, and 17/100 (g = 2); and 19/100, 
17/100, and 14/100 (g — 3). Maximizing over these possibilities, we see that 



and thus for a constant B, the contribution of the single overlap case to the 
correlation in (9) and thus to the value of E(Z) is 0(n 5 (5/6 • (0.86)) fc , which 
tends to zero if 



NOTE: Observe that a "global maximization" calculation with two 19/100's 
and three 17/100's would have yielded E(Z) if n 5 (5/6 • (0.89)) fe -> 0, 
or if k > 11.591gn - invalidating our proposed method of proof, since the 
putative lower threshold is at 31gn/lg(1.2) = 11.4051gn. 

Consider next the two-overlap case. Equation (10) remains unchanged, 
but the conditional probabilities F(D UVW D F^ k \B a i> c ) have a denominator of 
20. Given the four entries in T U A in the two-overlap case, the components 
of the pattern uvw may be chosen in 4 ways, and it remains to calculate how 
many of these are also consistent with F^. There are three cases. If the 



nC ijk \B abc ) < (0.86) fc , 
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components of the two columns in the overlap are identical as, e.g., in 

3 2 1 
2 1 3 

the entries in the four positions may appear in two forms, in this case 3214 
or 4213. If the components of the two columns are consistent as, e.g., in 

2 1 3 

2 3 1 

then there is only one possible arrangement, in this case 3241. Finally, if the 
components are inconsistent, consisting of one monotone increasing and one 
monotone decreasing pattern, e.g., 

1 3 2 

3 2 1 

then there is clearly no arrangement. We thus have F(D UVW nFij k ) equalling 
2/24, 1/24, or in these three cases, or F(D UVW n F? jk )=2/21, 3/24, or 4/24 
respectively. Now given a taboo index abc, a target pattern ijk and any 
two overlap positions, we upper bound as follows: There can be at most 
three component arrangements uvw ^ abc inconsistent with ijk. Of the 
remaining two arrangements uvw, the worst case is when both are consistent 
with the overlapping positions. This possibility is actually realized with 
ijk = 132, abc = 123, and the overlap occurring in the first and third spots 
of A. (9) and (10) thus yield for some constant C, 

^-i) S c.g)V(|g + i}) 4 . .(o.^ 

and thus the contribution of the double overlap case to E(Z) is of magnitude 
n 4 (0.75) fe , which tends to zero if k > 9.641gn. We are now ready to prove the 
main result of this section: 

Theorem 3.3. Suppose we choose k permutations randomly, uniformly, 
and with replacement from S n , then the probability that they shatter all 3- 
permutations in any three positions i\ < 22 < ^3 tends to zero as n — > 00 
provided that k < 31g "~ where u{n) > Mglgra for some constant E. 
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Proof. By Talagrand's inequality used as in Section 2, with m = Med(Y), 
we see that 

P(Y = 0) < 2e~^, 

and thus, since 

\E(Y)-m\ < 40 v / 3A;E(y), 

it follows that 

P(X = 0) = P(Y = 0) < 2 exp ^--L(E(X) - E(Z) - 40 V / 3£;E(X))^ . 

As in Theorem 2.10, we thus need to verify when E(Z) — >■ 0, and E(X)//c — > 
oo. We have already seen that E(Z) — > when > 10.411gn. Finally, we 
have for constants D,E 

5X , 



E(X) > Dn 6 y-j ->■ oo 
if 

31gn - u;(7i) 
" lg(l-2) ' 

with > -Elglgn. The full conclusion of the theorem, namely that P(X = 
0) — > for k < 10.411gn, follows by monotonicity. □ 



4 Open Questions 

1. Can the results in this paper be of value in actually improving on minimum 
known sizes of covering arrays for small values of n as found, e.g., in websites 
maintained by Charlie Colbourn and the NIST? To answer this question, one 
would first need to derive exact and messy upper and lower bounds for the 
probability that a random array produces a covering, devoid of (1 + o(l)) 
terms. This can then be translated into a statement about the expected 
number of random arrays needed until a covering array can be found. Would 
this approach be computationally feasible, and for which values of q, t could 
one hope for an improvement? 

2. Can threshold results be obtained for probability models other than 
the ones adopted by us? 
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3. Can the results of Section 3 be extended to all vaues of tl Is the 
easy-to-prove upper threshold of lg ( f! /^7_i)) cl° se to the lower threshold? 
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